Indexing Internal Memory with Minimal Perfect Hash Functions

نویسندگان

  • Fabiano C. Botelho
  • Hendrickson Reiter Langbehn
  • Guilherme Vale Menezes
  • Nivio Ziviani
چکیده

A perfect hash function (PHF) is an injective function that maps keys from a set S to unique values, which are in turn used to index a hash table. Since no collisions occur, each key can be retrieved from the table with a single probe. A minimal perfect hash function (MPHF) is a PHF with the smallest possible range, that is, the hash table size is exactly the number of keys in S. MPHFs are widely used for memory efficient storage and fast retrieval of items from static sets. Differently from other hashing schemes, MPHFs completely avoid the problem of wasted space and wasted time to deal with collisions. In the past, the amount of space to store an MPHF description was O(logn) bits per key and therefore similar to the overhead of space of other hashing schemes. Recent results on MPHFs by [Botelho et al. 2007] changed this scenario: in their work the space overhead of an MPHF is approximately 2.6 bits per key. The objective of this paper is to show that MPHFs are a good option to index internal memory when static key sets are involved and both successful and unsuccessful searches are allowed. We have shown that MPHFs provide the best tradeoff between space usage and lookup time when compared with linear hashing, quadratic hashing, double hashing, dense hashing, cuckoo hashing and sparse hashing. For example, MPHFs outperforms linear hashing, quadratic hashing and double hashing when these methods have a hash table occupancy of 75% or higher (if the MPHF fits in the CPU cache the same happens for hash table occupancies greater than or equal to 55%). Furthermore, MPHFs also have a better performance in all measured aspects when compared to sparse hashing, which has been designed specifically for efficient memory usage.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Minimal Perfect Hashing in Main Memory Indexing

With the rapid decrease in the cost of random access memory (RAM), it will soon become economically feasible to place full-text indexes of a library in main memory. One essential component of the indexing system is a hashing algorithm, which maps a keyword into the memory address of the index information corresponding to that keyword. This thesis studies the application of the minimal perfect h...

متن کامل

An Approach for Minimal Perfect Hash Functions for Very Large Databases

We propose a novel external memory based algorithm for constructing minimal perfect hash functions h for huge sets of keys. For a set of n keys, our algorithm outputs h in time O(n). The algorithm needs a small vector of one byte entries in main memory to construct h. The evaluation of h(x) requires three memory accesses for any key x. The description of h takes a constant number of up to 9 bit...

متن کامل

Hash and Displace: Efficient Evaluation of Minimal Perfect Hash Functions

A new way of constructing (minimal) perfect hash functions is described. The technique considerably reduces the overhead associated with resolving buckets in two-level hashing schemes. Evaluating a hash function requires just one multiplication and a few additions apart from primitive bit operations. The number of accesses to memory is two, one of which is to a fixed location. This improves the...

متن کامل

A Family of Perfect Hashing Methods

Minimal perfect hash functions are used for memory efficient storage and fast retrieval of items from static sets. We present an infinite family of efficient and practical algorithms for generating order preserving minimal perfect hash functions. We show that almost all members of the family construct space and time optimal order preserving minimal perfect hash functions, and we identify the on...

متن کامل

A Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions

We describe the structure of a space-efficient phrase table for phrasebased statistical machine translation with the Moses decoder. The new phrase table can be used in-memory or be partially mapped on-disk. Compared to the standard Moses on-disk phrase table implementation a size reduction by a factor of 6 is achieved. The focus of this work lies on the source phrase index which is implemented ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008